242 research outputs found
Multi-Path Relay Selection Algorithm Based on the Broadcast TV
AbstractThis paper presents a relay selection method for Broadcast TV services. This method get through the node's time-delay and power information, obtain the value of the system interrupt as to be a decision threshold, then chose the relay node. At the same time this paper proposes an optimal relay selection strategy which can minimize the system interrupt probability combination with power distribution——Multi-Path Relay Routing Protocol. This protocol can dynamically change the appropriate route according to the shifty network. Simulation results show that the protocol can extend the coverage area, reducing time-delay and increase system throughput, improve system spectral efficiency, and enhance the Qos of the Broadcast TV service
Knowledge-enhanced Visual-Language Pre-training on Chest Radiology Images
While multi-modal foundation models pre-trained on large-scale data have been
successful in natural language understanding and vision recognition, their use
in medical domains is still limited due to the fine-grained nature of medical
tasks and the high demand for domain knowledge. To address this challenge, we
propose a novel approach called Knowledge-enhanced Auto Diagnosis (KAD) which
leverages existing medical domain knowledge to guide vision-language
pre-training using paired chest X-rays and radiology reports. We evaluate KAD
on {four} external X-ray datasets and demonstrate that its zero-shot
performance is not only comparable to that of fully-supervised models, but also
superior to the average of three expert radiologists for three (out of five)
pathologies with statistical significance. Moreover, when few-shot annotation
is available, KAD outperforms all existing approaches in fine-tuning settings,
demonstrating its potential for application in different clinical scenarios
Unsupervised Domain Adaptive Fundus Image Segmentation with Few Labeled Source Data
Deep learning-based segmentation methods have been widely employed for
automatic glaucoma diagnosis and prognosis. In practice, fundus images obtained
by different fundus cameras vary significantly in terms of illumination and
intensity. Although recent unsupervised domain adaptation (UDA) methods enhance
the models' generalization ability on the unlabeled target fundus datasets,
they always require sufficient labeled data from the source domain, bringing
auxiliary data acquisition and annotation costs. To further facilitate the data
efficiency of the cross-domain segmentation methods on the fundus images, we
explore UDA optic disc and cup segmentation problems using few labeled source
data in this work. We first design a Searching-based Multi-style Invariant
Mechanism to diversify the source data style as well as increase the data
amount. Next, a prototype consistency mechanism on the foreground objects is
proposed to facilitate the feature alignment for each kind of tissue under
different image styles. Moreover, a cross-style self-supervised learning stage
is further designed to improve the segmentation performance on the target
images. Our method has outperformed several state-of-the-art UDA segmentation
methods under the UDA fundus segmentation with few labeled source data.Comment: Accepted by The 33rd British Machine Vision Conference (BMVC) 202
MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training in Radiology
In this paper, we consider enhancing medical visual-language pre-training
(VLP) with domain-specific knowledge, by exploiting the paired image-text
reports from the radiological daily practice. In particular, we make the
following contributions: First, unlike existing works that directly process the
raw reports, we adopt a novel triplet extraction module to extract the
medical-related information, avoiding unnecessary complexity from language
grammar and enhancing the supervision signals; Second, we propose a novel
triplet encoding module with entity translation by querying a knowledge base,
to exploit the rich domain knowledge in medical field, and implicitly build
relationships between medical entities in the language embedding space; Third,
we propose to use a Transformer-based fusion model for spatially aligning the
entity description with visual signals at the image patch level, enabling the
ability for medical diagnosis; Fourth, we conduct thorough experiments to
validate the effectiveness of our architecture, and benchmark on numerous
public benchmarks, e.g., ChestX-ray14, RSNA Pneumonia, SIIM-ACR Pneumothorax,
COVIDx CXR-2, COVID Rural, and EdemaSeverity. In both zero-shot and fine-tuning
settings, our model has demonstrated strong performance compared with the
former methods on disease classification and grounding
Towards Generalist Foundation Model for Radiology
In this study, we aim to initiate the development of Radiology Foundation
Model, termed as RadFM.We consider the construction of foundational models from
the perspectives of data, model design, and evaluation thoroughly. Our
contribution can be concluded as follows: (i), we construct a large-scale
Medical Multi-modal Dataset, MedMD, consisting of 16M 2D and 3D medical scans.
To the best of our knowledge, this is the first multi-modal dataset containing
3D medical scans. (ii), We propose an architecture that enables visually
conditioned generative pre-training, allowing for the integration of text input
interleaved with 2D or 3D medical scans to generate response for diverse
radiologic tasks. The model was initially pre-trained on MedMD and subsequently
domain-specific fine-tuned on RadMD, a radiologic cleaned version of MedMD,
containing 3M radiologic visual-language pairs. (iii), we propose a new
evaluation benchmark that comprises five tasks, aiming to comprehensively
assess the capability of foundation models in handling practical clinical
problems. Our experimental results confirm that RadFM significantly outperforms
existing multi-modal foundation models. The codes, data, and model checkpoint
will all be made publicly available to promote further research and development
in the field
Exploring Annotation-free Image Captioning with Retrieval-augmented Pseudo Sentence Generation
Training an image captioner without annotated image-sentence pairs has gained
traction in recent years. Previous approaches can be categorized into two
strategies: crawling sentences from mismatching corpora and aligning them with
the given images as pseudo annotations, or pre-training the captioner using
external image-text pairs. However, the aligning setting seems to reach its
performance limit due to the quality problem of pairs, and pre-training
requires significant computational resources. To address these challenges, we
propose a new strategy ``LPM + retrieval-augmented learning" where the prior
knowledge from large pre-trained models (LPMs) is leveraged as supervision, and
a retrieval process is integrated to further reinforce its effectiveness.
Specifically, we introduce Retrieval-augmented Pseudo Sentence Generation
(RaPSG), which adopts an efficient approach to retrieve highly relevant short
region descriptions from the mismatching corpora and use them to generate a
variety of pseudo sentences with distinct representations as well as high
quality via LPMs. In addition, a fluency filter and a CLIP-guided training
objective are further introduced to facilitate model optimization. Experimental
results demonstrate that our method surpasses the SOTA pre-training model
(Flamingo3B) by achieving a CIDEr score of 78.1 (+5.1) while utilizing only
0.3% of its trainable parameters (1.3B VS 33M). Importantly, our approach
eliminates the need of computationally expensive pre-training processes on
external datasets (e.g., the requirement of 312M image-text pairs for
Flamingo3B). We further show that with a simple extension, the generated pseudo
sentences can be deployed as weak supervision to boost the 1% semi-supervised
image caption benchmark up to 93.4 CIDEr score (+8.9) which showcases the
versatility and effectiveness of our approach.Comment: 10 pages 5 figure
- …